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Abstract 



There has been a tremendous growth in publicly available digital video footage over 
the past decade. This has necessitated the development of new techniques in computer 
vision geared towards efficient analysis, storage and retrieval of such data. Many mid- 
level computer vision tasks such as segmentation, object detection, tracking, etc. involve 
an inference problem based on the video data available. Video data has a high degree of 
spatial and temporal coherence. The property must be intelligently leveraged in order 
to obtain better results. 

Graphical models, such as Markov Random Fields, have emerged as a powerful tool 
for such inference problems. They are naturally suited for expressing the spatial de- 
pendencies present in video data, It is however, not clear, how to extend the existing 
techniques for the problem of inference over time. This thesis explores the Path Proba- 
bility Method, a variational technique in statistical mechanics, in the context of graphical 
models and approximate inference problems. It extends the method to a general frame- 
work for problems involving inference in time, resulting in an algorithm, DynBP. We 
explore the relation of the algorithm with existing techniques, and find the algorithm 
competitive with existing approaches. 

The main contribution of this thesis are the extended GBP algorithm, the extension 
of Path Probability Methods to the DynBP algorithm and the relationship between 
them. We have also explored some applications in computer vision involving temporal 
evolution with promising results. 
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Notation and Abbreviations 



Symbol Definition 



G undirected graph 

V vertex or node set of a graph 

E edge set of graph 

a cluster of a region graph 

c a counting number of cluster a 

x a state of the cluster 

b a (x. a ) belief that cluster a has state x a 
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Chapter 1 
Introduction 



There has been an exponential growth of digital video data publicly available during the 
past decade. This has posed new problems dealing with storage, analysis, classification, 
retrieval of videos. Therefore, making interesting observations about video data has 
become paramount in the field of computer vision. We have, so far, not clearly specified 
what we mean by an interesting observation. An interesting observation in video is 
largely dependent on the context of the problem. For example, in surveillance videos, 
detecting motion in video or whether a person is present or not might be important, 
whereas, in a video shot from a mobile phone, one might be interested in cleaning up 
the jitter that might be present in the video initially before archiving. 

Traditional image processing techniques have been successfully applied to video data. 
However, videos exhibit a strong degree of causality, i.e., the natural progression of 
the video in time. Several existing methods have successfully exploited the temporal 
information available in videos. 

One of the contributions of this thesis is a framework for inference for probabilistic 
models which exhibit causality and spatial correlation. This analysis is well-suited for 
inference in many computer vision problems. 

Graphical models [351 HE] and belief propagation algorithms [23, HI H3] have emerged 
as powerful tools for a variety of inference problems due to ease of implementation and 
applicability. They have proven of great appeal in computer vision as they can implicitly 
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capture the spatial correlations in image data. Belief propagation algorithms using MRF 
models [HI El H2] have been used in a number of computer vision problems such as image 
restoration [TT], tracking [12], stereo [30], inpainting [21], shape-matching [7], etc. 

Video data exhibits a strong degree of temporal coherence. Further, the change 
between successive video frames is usually small enough, (except in case of sharp changes, 
say between scenes) for most common footage. The data also exhibits a sharp degree of 
spatial coherence. For example, if a pixel is in motion, nearby pixels are more likely to 
be in motion. 

This thesis focusses on approximate inference over systems which are evolving with 
time. It also explores three computer vision problems that exhibit a high degree of 
spatio-temporal coherence and are well suited to application of the inference algorithm, 
DynBP. 

• Moving Object Detection 

• Video Denoising 

• Dropped frame reconstruction 

The thesis is organized as follows: Chapter [2] presents the necessary mathematical 
background of graphical models and variational algorithms, Chapter [3] we extend the 
path probability methods present in statistical mechanics and present the resultant al- 
gorithm, DynBP. We analyse the DynBP algorithm and compare it against existing 
approaches in Chapter [4| We apply our algorithm to computer vision problems and 
show the experiments and results in [5} Finally, Chapter [6] presents the conclusions and 
future work. 



Chapter 2 
Graphical Models 



Graphical models are a powerful framework for representing and manipulating probabil- 
ity distributions over sets of random variables. They provide a methodology for solving 
problems involving thousands of random variables that are linked in complex ways, and 
have found tremendous application in solving statistical problems in many fields such as 
bioinformatics |38j, computer vision [9], communication [10], speech processing [2], etc. 

In this we present a brief introduction to the existing literature in the field before 
focussing on the techniques that are relevant to this thesis. A more comprehensive 
description can be found in [THJ, [26] . 

2.1 Basics of graphical models 

Graphical models can be either directed ( as known as Bayesian networks) or undirected. 
In a graphical model, the edges of the underlying graph represent the probabilistic de- 
pendencies between variables. In this thesis, we consider only the undirected graphical 
models. 

Given a graph G = (V,E), a probabilistic graphical model is formed by associating 
with each node s G V a random variable x s taking values in the sample space X. 
This sample space can either be a continuum (e.g. X = 9ft), or the discrete alphabet 
X = {0, . . . , m — 1}. In this latter case, the underlying sample space X N is the set of all 
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Figure 2.1: A simple factor graph 



N vectors x = {x s \s G V} over m symbols, so that X N = m . 
The joint distribution of an undirected graphical is defined by 

pw = \ n m*c) (2.i) 

where C is the set of maximal cliques in the graph, ipc{ x c) is a potential function (a 
positive, but otherwise arbitrary, real-valued function) on the clique x c , and Z is the 
normalization factor 

2 = EII fc(xc) (2.2) 
x cec 

An undirected graphical model is often represented by a factor graph [25J. A factor graph 
is a bipartite graph, with its nodes V U {C G C}, and an edge between a variable node 
i s 6 y and a factor node ipc G C iff the variable is a member of the clique representing 
the factor, i.e., x s C x c . 



2.1.1 Example 



Consider the graphical model shown in |2.1.1[ The variables nodes 21,22,^3 and the 
factor nodes are /12 and ^23- The probability distribution is given by 



1 



p(xi,X 2 ,X 3 ) = ■=fa(Xx,X2)fa(X2,X 3 ) 



If each of the variables can take values in {0, 1}, the partition function is given by 



111 

Z = fl2(x 1 ,X 2 )f23(x 2 ,X 3 ) 

X1=0 £2=0 213=0 
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2.2 Inference 

A problem that arises in many applications of interest is that of estimating the random 
vector x = {x s \s G V} based on a set of noisy observations y = {y s \s G V}, For instance, 
in computer vision [TTJ [H], the vector x could represent an image defined on a grid, and 
y could represent a noisy or blurred version of the image. 
We are often interested in the following problems: 

1. The maximum a posteriori (MAP) estimate corresponding to finding the most 
likely state x, based on the noisy observations y - that is: 

^map = arg max p(x|y) (2.3) 

In absence of noisy information, the problem reduces to finding: argmax xCi ^iv p(x). 

2. Computing the marginal distribution of a subset of variables. For the example 



in 



2.1.1, one might be interested in p(xi), i.e., 



i i 

p{Xi = 0) = fafa = 0,X 2 )f23(x 2 ,X 3 ) 

X2=0 2:3=0 

3. Computing the partition function Z = J2x Ilcec ^c( x c) 

Several methods have been developed solve the problem exactly or approximately. 

• Exact inference 

• Sampling Methods 

• Variational Methods 

2.2.1 Exact Inference 

Exact inference computes the quantity of interest, say the marginal distribution at a 
node, by appealing to the distributive law [TJ. For example, in the factor graph given in 
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Example |2.1.1 the computation of the marginal at x\ would be according 



The order of elimination of the variables is critical to the performance of the algorithm 
and considerable work has been done in identifying the order. 

A common approach is graph triangulation, by inserting additional edges, resulting in 
the junction tree algorithm [16]. A junction tree has the running intersection property: 
If a node appears in any two cliques in the tree, it appears in all cliques that lie on the 
path between the two cliques. In a junction tree, because of the running intersection 
property, local consistency implies global consistency. 

The main difficulty with the exact approach is that the computation cost in expo- 
nential in the size of largest clique in the graph. Thus, exact inference is not possible on 
large sized graphs. This necessitates alternative methods, which can handle large graph 
sizes. 

Belief Propagation (BP) and variants 

Belief propagation has emerged as a powerful technique for approximate inference in 
graphical models. On graphs without cycles, belief propagation is exact and can be 
viewed as an application of the distributive law with multiple elimination order [351 Q]. 

It performs exact inference on graphs with cycles. In dense graphs, it provides an 
approximate solution by passing messages between factor nodes and variable nodes. 

The general update equations for the messages for the sum-product algorithm [25] 
are given as 




(2.4) 



Xj £Xq, 




n 




(2.5) 



h£N(x)\a 
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Figure 2.2: The sum-product algorithm update equations (a) message from variable i to 
factor a based on other neighbouring factors N(i) — {a}, m^^a^) = Y[h&N(x)\a m h-^i(xi) 
(b) message from factor a to variable % based on other neighbouring variables N(a) — {i}, 




where m a ^i(xi) (m^ a (xj)) is the message from (to) factor node a to (from) variable 
node i, and b(xj) is the marginal distribution at variable node i. 

Wainwright etal. jl3] presented an alternative approach which obtains the results 
for a graph with cycles, as appropriately reweighted convex combination of instances 
of classic belief propagation runs on trees. Yedidia etal. [49J used region based free en- 
ergy approximations to obtain an algorithm, GBP, with better convergence properties. 
Weiss etal. |Hj have recently investigated the relationship of BP with linear program- 
ming. 

2.2.2 Sampling Methods 

Sampling algorithms such as importance sampling and Markov Chain Monte Carlo meth- 
ods are an alternative approach for solving approximate inference problems [281 H] • The 
algorithms exploit the Markov blanket property of graphical models: simply, that con- 
ditioning a node on its markov blanket ( which is just the set of its neighbours in the 
undirected graphical model), renders it independent of all other variables. 
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The method maintains a proposal distribution from which samples are drawn and 
accepted or rejected so that the resulting distribution matches the requirements. The 
Metropolis Hastings algorithm draws the samples based on the current state s.t. the 
distribution is g(x(* +1 )|xw), and the states x^ 2 \ . . . form a markov chain. 

The Gibbs algorithm samples a distribution by replacing the value of one of the 
variables by drawing from a distribution conditioned on the remaining variables. For 
example, the sample distribution p(x) = p(xi, X2, ■ ■ ■ , x n ) is sampled at each time to 
get the new iterate as drawn from ~ p(xi\y^ t \i) where x^w is the value of the 

remaining variables. Similarly, for the next iteration, another index j is chosen based on 
some order, and the value x^ 2 ^ updated based on xf* +1 . 

The algorithms are simple to implement and provide theoretical guarantees of con- 
vergence. However, the convergence is often slow. Newer methods have been developed 
which ameliorate the problems to an extent: slice sampling [30J which adapts the step 
size in the Metropolis algorithm to match the characteristics of the distribution, Hybrid 
Monte Carlo algorithm |29j which can make large changes to system state while keeping 
the rejection probability small. 

Sampling methods are most commonly used in case of arbitrary graphs with no 
structure. 

2.2.3 Variational Methods 

Variational methods refers to techniques which obtain a solution by posing it as the 
result of an optimization problem. The finite element method [41], maximum entropy 
estimation [19], mean field methods [31] are instances of variational methods. In this 
thesis, we focus on the Naive Mean field Method in statistical mechanics. 

Naive Mean Field Method 

This method approximates an intractable distribution p(x) = p(x\, . . . , x n ) by a dis- 
tribution g(x) belonging to a tractable class of distributions Q which minimizes some 
distance measure D(p,q) within the class Q. The Kullback Liebler divergence KL(q\\p) 
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is often chosen as the distance measure for tractable computation, given by 

KL( q \\ P ) = Y,q(x)^ q ^ (>0) (2.7) 

The method was originally developed in statistical mechanics dealing with Ising spin 
systems. The spin systems have a probability distribution given by the Boltzmann 
distribution 

P --ff(x) 

P(*) = (2-8) 
where x = {x\, . . . , x n } are binary (spin) variables X{ G ±1 and 

H(x) = Jij x i x j - h * x i ( 2 - 9 ) 

The partition function is given Z = J2x e ~ H ^ ■ The direct computation of the partition 
function involves 0(2 n ) computations, which is not possible for real large scale systems. 
The method attempts to compute an approximation to the free energy — In Z as follows: 

KL(q\\ P )=\nZ + E(q)-S(q) (2.10) 

where E(q) is the variational energy given by E(q) = J2 X <?( x )-^( x ) an d S[q] is the 
entropy of the distribution q given by S(q) = — <?( x ) hig(x). 

The mean field approximation considers the distribution g(x) from the class Q of 
product distributions, i.e., 

q(x) = f[q J (x j ) (2.11) 

i=i 



For the enthalpy function H(x) in (2.9), the above equation may be rewritten in terms 
of variational parameters rrij as 

(1 + j;;m,') 

<lMr>»j) ./ J (2-12) 

where rrij are identified as expectations rrij = E q [xj). The variational energy E(q) and 
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the entropy S(q) can be simplified as: 



f 1 + vrij 1 + vrii 1 - m, 1 - 
2 11 2 2 n 2 

2 



} 



(2.13) 



E(q) 



(2.14) 



i,3 i 



The variational free energy F(q), is given by 



F(g) = E(q) - S(q) 



(2.15) 



and it is an upper bound on the true free energy — In Z as 



KL(q\\p) > (Using (2.10)) 



=>--lnZ < E(q) - S(q) = F(q) 



(2.16) 



Minimizing the variational free energy F(g) w.r.t. the parameters rrij yields the set of n 
mean field equations, given as: 



The mean field method uses the independence of spin probabilities, i.e., E q [xiXj\ = 
Eq[xi]E q [xj], in order to obtain a tractable approximation to the computation of the 
free energy. We discuss the Bethe-Pierls approximation in the following section, which 
considers joint pair beliefs in addition to node beliefs. 

Bethe-Pierls Approximation 

The Bethe-Pierls approximation derives the variational free energy in terms of single 
node beliefs qi{xi) and pair beliefs qa(xi,Xj) with the normalization condition 




(2.17) 



3 
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The average energy is given by 

E (o) = X! • / '//' /( ''-••''/) -^hiQiixi) (2.18) 

ij i 

The average energy is exact if the node beliefs qij(xi,Xj) and qi(xi) are exact [51]. The 
entropy is given by the approximation 

SBethe{q) = - 'I'M '■ ■'',) In % (./-,..;■/) + ^(d; - 1) g^) In gj (x^ (2.19) 

where di denotes the degree of variable node i, i.e., the number of factor nodes connected 
to node i. The entropy expression is exact for singly-connected graphs [51], and the 
overall belief can be expressed as 

6(x) = UhiM^lpl (2.20) 

The variational free energy is given by Gsethe = E(q) — SBethe(o)- It can be shown that 
for a singly-connected graph, the beliefs obtained by Belief Propagation correspond to 
global minima of the bethe free energy. Further, for any general graph, the set of beliefs 
give a BP fixed point if and only if they are local stationary points of the Bethe free 
energy [51 J. 

Other methods such as the Cluster Variation Method (CVM) [201 [27] consider hierar- 
chy of localized clusters and approximate the system entropy in terms of the distributions 
on the localized clusters. 

In the following section, we present a brief description of the Cluster Variation 
Method. An extended review of the method and its applications is found in 
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2.3 Cluster Variation Method 

The cluster variation method is a generalization of the Bethe-Pierls approximation. It 
computes the approximate free energy in a statistical system under equilibrium condi- 
tions. 

The variational free energy is given by 

F = —\nZ = min J-(q) = min g(x)if (x) + g(x) In g(x) (2-21) 



The underlying idea is to treat the energy term in (2.21 ) exactly and to approximate 
by means of a truncated cumulant expansion. A cluster or region x a C x is a subset 
of a factor graph, such that, is a factor node f a belongs to x a , then all variable nodes 
Xi connected to f a also belong to x Q . Given a cluster x Q , the energy i7 Q (x a ) and the 
probability distribution on the cluster, q a (-x a ), are defined as: 

tf a (x a ) = £# a (x a ) (2-22) 
g a (x Q ) = £g(x) (2.23) 

x\x a 

where if a (x a ) denotes the energy contribution of factor node f a - The cluster entropy, 
S a , is given by 

S a = -^g Q (x Q )lng a (x Q ) (2.24) 

The entropy cumulants have the following relation any cluster a and all its sub-clusters 

13 C a, 

S a = £ S/j (2.25) 
where the entropy cumulant 5^ is given by means of a Mobius inversion as 

S fi = £(-l)"«- n ^ Q (2.26) 

oC/3 

where n a denotes the number of variable nodes in cluster a. The variational free energy 
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in (2.21) can then be written as 



^) = $>(x)tf(x)-£^ 



(2.27) 



where the second summation is over all possible clusters. 

The above equation is exact. An approximation is made by choosing a set, TZ, of 
maximal clusters and their sub-clusters such that each factor node is present in at least 
one cluster. Then, the cumulant is truncated by retaining only the terms of the cumulant 
present in TZ. 



(i (3eK a£_K 



(2.28) 



where the coefficients c a are known as Mobius numbers and satisfy 



c a = 1 Wa G TZ 



(2.29) 



The free energy in (2.21) is approximated as 



F(q a , a G TZ) = c aFa(q a ) 



(2.30) 



where T a (q a ) is the cluster free energy for a, given by 



Fa{q a ) = q a (^ a )H a (x a ) + g a (x a ) lng a (x c 



(2.31) 



The method involves minimization of the approximate free energy in (2.30) w.r.t. the 
cluster probability distributions {q a , a G TZ}, subject to the constraints, 



^g Q (x Q ) = 1 Wa G TZ (normalization) 



(2.32) 
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(b) 



Figure 2.3: (b) A factor graph with a set of clusters with associated counting numbers. 
The maximal clusters are A and B, with sub-cluster c, with the relation, c C A and 
c C B. (a) A valid region graph with the associated counting numbers. 

g„(x Q ) = ^(x^j) Vx^, P C a eTZ (compatibility) (2.33) 

X a \x,3 

2.3.1 Example 



We consider the factor graph shown in Figure 2.3.1 The probability distributions on the 
maximal clusters A and B and the common sub-cluster c, should satisfy the normalization 
and compatibility constraints. For example, the normalization constraint for cluster A 
is given by 

qA(x 1 ,x 2 ) = 1 

Xl,X2 

and the compatibility constraint between cluster A and sub-cluster c is given by 



2.4 Generalized Belief Propagation (GBP) 

Yedidia etal. jlSl EI] investigated the relationship between the Belief Propagation al- 
gorithm and the variational methods. It has been shown that the belief propagation 
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attempts to solve the fixed points of the variational free energy in (2.21 ) using the Bethe- 
Pierls approximation, where the big regions consist of individual factor nodes with their 
associated variable nodes; and the small regions consist of individual variable nodes, 
thus, establishing a close relationship between the two approaches. 

Yedidia etal. [121 ED] developed a new iterative update algorithm, Generalized Belief 
Propagation (GBP), closely related to the Cluster Variation Method. They proposed 
the idea of region-based free energy approximations which simplified the constraints in 



(2.32) to counting constraints only on the factor and variable nodes, i.e., 



c a = 1 Va (factor node) (2.34) 



a,a£a 



c a = 1 Vs (variable node) (2.35) 



a,s6a 



Once, an appropriate region graph is constructed which satisfies (2.34), the region 
beliefs are updated according to the contribution from messages from the neighbours 
until the beliefs converge. We now describe the parent-to-child version of the GBP 
algorithm. 



2.4.1 Parent-to-child algorithm 

The parent-to- child algorithm passes messages from the parent region to the child region. 
Each region R has a belief &r(x p ) given by 

b R (x R ) oc Y[ / a (xa) • JJ mp_ jB (x jR ) • I II IT m P /_ +Z) (x D )(i36) 

aei? \PeV(R) / \ DeV(R) P'eV(D)\£(R) / 

where V(R) is the set of regions that are parents to region R, T>(R) is the set of all 
regions that are descendants of region R, £(R) — R U T>(R) is the set of all regions that 
are descendants of R and also region R itself, and V(D)\S(R) is the set of all regions 
that are parents of region D except for region R itself or those those regions that are 
also descendants of region R. 

The message update is given by 
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Figure 2.4: An example of a region graph. The belief at region R is given by b R oc 
mA^RrnB~,B m c-*E'fnc->HrriF^Hl\a&R /a( x a)- The parent-to-child message update from 



RtoEis given by m R _+ E (-x E ) :- 



£R\E 



/a(Xa) 



™D->g(xg) 




/ \ / 



/ 



H 



Sx P \ fl riaeF PXJ , ri(/,j)eJV(p,K) 
m P ^ R (x i? ) := ^ — i — (2.37) 

ll(i,j)eD(p,R) mi-*j(xj) 

where N(P, R) is the set of all connected pairs of regions (J, J) such that J is in £(P) 
but not £(R) while / is not in S{P). D(P,R) is the set of all connected pairs of regions 
(I, J) such that J is in £(R), while J is in £(P) , but not £(R). 

Example 



We consider the region graph in Figure 2.4 The belief 6r(x#) at region R is the 



product of its local factors Y[ a &R fai^-a), the messages from its parents ^^(xjj), and 
ms- > fl(xij), and the messages into descendants from other parents who are not descen- 
dants: m^ B (x E ), m C -,H{xH) and m F ^ H (x H ). 
The belief at region i? is given by 



6 fi oc m A ^ R m B ^ R m c _ E m c ^ H m F ^ H \{ / a (x a 
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Similarly, for the message update from parent R to child E, the sets N(R, E) = {(A, R), (B, R)} 
and D(R, E) = {(D, G)}. The message update equation is given by 

/ x Ex R \x E m A ^ R (x R )m B ^ R (x R )T[aeR\Efa{Xa) 
^R^E\?^E) '■— / s 

2.5 Problem Definition 

We have introduced the Cluster Variation Method and the conceptually similar Gener- 
alized Belief Propagation algorithm . We note that the techniques described above are 
applicable to the static inference case, i.e., when there is no temporal evolution in the 
probabilities, and a single run of the algorithm provides the approximate beliefs. 

However, we are interested in the problem of tracking the evolution of marginal beliefs 
, not just their steady states. We now present the problem of inference on distributions 
evolving over time. 

We consider a system of N discrete- valued random processes X(i) = {X (t), . . . , X^_i(t)} 
having the Markov property, i.e., for times < t 1 < . . . < t m -\ < t m , we have 
P(x* m |x t ™- 1 , . . . ,x t:L ) = P(x' m |x* m - 1 ), where x* stands for the system having state 
{x , . . . , xn-i} at time t and P(x*) represents the probability of the system having state 
x* at time t, i.e., P(X(t) = x*). 

Given a conditional probability distribution p(x* +<5t |x*) which indicates the probabil- 
ity that the system has state x* +<5 * at time (t + 5t), given that it had state x* which can 
be factorized into localized interactions, denoted using the factor graph notation: 

p(x t+ V) = -J-T, II U^ t+St a\^a) = -J-t, exp{-Pf (x*+ V)} (2-38) 

where P(x* +<5 *|x') = — J2a<=A m /a(x* +<5 * a |x* a ) and the conditional partition function is 
Z(x*) = X)x« YiaeA /a( x<+5< a| x< a)- Given an initial trial probability distribution {6(x*)} 
at time t — 0, we wish to find {6(x*)} W > 0. 
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2.6 Extended GBP formulation 



We now extend the GBP algorithm for the problem mentioned above. Let 1Z 



static 



{(a s , c Q J} be a valid region graph for the static GBP case, satisfying (2.34). We consider 



the discrete time case, where the "space-time" factor graph is given by 
Mx m |x*) = -^n/a(4 +1 |xl) = ^ 1 7Y exp{-i7(x' +1 |x t )} tG{l,...,T} (2.39) 

Z ( X )aeA Z ( X ) 

and initial probability distribution {6(x')} at time t = is given. 

The set of regions, Rd yn for the extended GBP algorithm consist of the following: 

• The joint beliefs at time t and t — 1, a*'*^ 1 = {a*, a* -1 } with counting number 
c a t,t-i = c as and factors T[ a( , as / a (x* +1 |x* ) . 

• The single state beliefs at time t, given by a 1 = a* with counting number c a t = —c as 
an no factors except initial at time t — which will have factor 6(x° ). 

Then, each factor a at for the time t\t — 1 is part of one joint region a*'* -1 , while each 
variable node i at time t is part of three regions: one past-present evolution region a*'* -1 , 
one present to next time evolution region and one for current beliefs at time t, a*. 

The counting numbers for a variable i, and factor a at time t are given as: 

c a t,t-i = °a s = 1 ^ a (factor) (2.40) 

c a t,t-i + c a'+ i ' t + c «* = c a s + c <*s + H _c "s 

= 1 + 1-1 = 1 Vi (variable) (2.41) 



which satisfy the GBP constraints (2.34). 

A key requirement in inference over time is causality, i.e., the future states should 
not affect the past states. This is achieved due to the additional state regions a*, which 
restrict the backward flow of messages from joint regionsa* +1,t to a l,t ~ l . The message 
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passing is in the forward direction, as illustrated by Figure 2.5 This is similar to the 
forward filtering approach in [52J. The message update rules are influenced by the 
direction of time and are no longer generic. 

We present an alternative formulation in the next chapter, which is naturally suited 
to such evolution problems. 



Summary 

In this chapter, we have briefly reviewed the existing techniques of inference over graph- 
ical models. We focus on the variational methods of approximate inference, and the 
relation between variational free energy and belief propagation algorithm. 

The next chapter introduces the dynamic equivalent of the Cluster Variation Method 
known as the Path Probability Method, which is a powerful technique that allows us to 
track the changing beliefs. We derive an alternative algorithm, DynBP, based on PPM, 
which is naturally suited to handle inference over time. 

We shall revisit the GBP algorithm in Chapter [4] to consider the extension of the 
GBP algorithm to inference over time. 



CHAPTER 2. GRAPHICAL MODELS 



20 



Figure 2.5: Figure presents an extended region graph for inference over time, (i) Region 
graph with underlying factor graph. The regions are marked as a) a*'*" 1 , b) a t,t+l , and 
i) a with associated factor 6(x°). (ii) Figure shows the direction of message passing in 
the region graph which maintains causality. This is similar to forward filtering pass in 
spatio-temporal MRF of [52] 
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Chapter 3 

Path Probability Method 



Path probability methods (PPM) was first studied by Kikuchi [2T] in the context of 
dynamic evolution of global ensemble quantities such as magnetism in spin glass systems. 
It has been successfully applied to a number of problems in statistical physics such as 
hopping conduction problems of many classical particles, lattice gas models, etc. A 
review of the theory and applications of the method in classical physics in found in J2JJ 
33j |3l] . The method is a powerful technique for studying temporal evolution of dynamical 
interacting multi-body systems. However, to the best of the author's knowledge, this 
thesis is the first attempt which extends and successfully applies this powerful technique 
to non-traditional problems in machine learning. 

We present a brief introduction to the method as it has been developed in the context 



of statistical physics in section [3A| We extend the ideas present to applications in graph- 
ical models in Section 3.2 and present the resulting algorithm, DynBP, in Section 3.3| 



3.1 Review of the method 

Path Probability Method applies the variational principle of minimum free energy to 
irreversible statistical mechanics. PPM defines path variables (analogous to the state 
variables in the CVM) which capture the information about the change of state of the 
system and constructs a path probability function (PPF) which expresses the probability 
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of a change occurring in terms of the path variables. Maximization of PPF leads to the 
most probable path of evolution of the system. 

The papers by [2T] and [15] investigated a homogeneous ferromagnetic Ising system 
with iV spin sites, with the Hamiltonian energy given by 

H{s) = -J SiSj - HoHj^Si (3.1) 

<i,j> i 

where Sj G {+1, —1} is the Ising spin on the ith site, < i,j > represents the nearest- 
neighbor pair, and fioH and J are constants. The probability of the system being in state 
x when in equilibrium is given by the Boltzmann distribution P(x) = exp{— 7i(x.)}/Z. 

State Variables 

The quantity of interest was the ensemble magnetism exhibited by the system. This 
depends on the total number of interacting as well as opposing spin pairs in the system. 
Thus, the state variables quantify the magnetism of the system under equilibrium, X s = 
{xi, yij}, i,j G {±1}, are the probabilities of finding a given spin state or pair interaction 



state at a randomly selected spin sites as given by Table [3TT| where z is the lattice number 
of the system, i.e., zN is the total number of spin pairs in the system. The state variables 
are inter-related probabilisties with the relations given by 

x+i + x-i = y+i-i + y+i+i + y-i+i + y-1-1 = i 
y+i-i + y+i+i = y-i+i + y+i+i = x+i 
y+i-i + y-i-i = y-i+i + y-i-i = x-i 

The hamiltonian, T~C(s), can be rewritten in terms of the ensemble state variables, X s , 

as 

H(*s) = ^— {(y+i+i + y-i-i) - (y+i-i + y_i+i)} - v HN{x +1 - x. x ) (3.2) 
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Table 3.1: State variables of the spin glass system, X s . The state variables represent the 
probability of finding a random spin site or spin pair in the given state. 



State Variable 


Mathematical Definition 


Probability of finding state 


x+i 




i=+l 




© spin 




TV ^1=1 ±& 


i=-l 




spin 


y+i+i 


-^-T ■ ■ 1 

zN ^<ij> J ~ s i = 


=+l,Sj 


= + 1 


© - © pair 


y+i-i 


±y .. 1 

zN ^><h3> A «i= 


=+l,Sj 


= -1 


© — © pair 


y-i+i 


ly .. i 


=-l,Sj 


= + 1 


© — © pair 


V-i-i 


i V • ■ 1 


=+l,8j 


= + 1 


© — © pair 



Path Variables 

The method studies the evolution of the state variables, X s , with time. It uses the notion 
of path variables, which capture the probability of change of the spins at a random site 
within a small time, St. The path variables are X p = {Xij, Yij^i}, k, I G {±1} where 
X it j is the probability that a node with spin i at time t will change to spin j at time 
(t + St), and Yij t M is the probability that a bond pair having state at time t will 
change to state (k, I) at time (t + St). The time St is chosen small enough such that at 
most one of the states changes in Yy^. 



The expression for the path variables are given in Table 3.2 where s\ denotes the 
state of spin site % at time t. The relations between state variables at time t, X\, and 
t + St, X* +5t , and the path variables X v are given as 



t+5t _ t 



Vi g ±1 



+ Yi-j,ij) (¥ij,i—j + Yij-ij) Vi, J G ±1 

The relations can be expressed in terms of independent variables as 

Y iijU = y\ i -2Y l (i) V?G±1 
y,,U = yti-\X2(i) + Y 2 {-i)] Vz,jG±l 



(3.3) 
(3.4) 



(3.5) 
(3.6) 



The change in the hamiltonian, A7i(X p ), can be expressed in terms of the path variables 
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Table 3.2: Path variables in the spin glass system, X p . The path variables represent the 
probabilities of the indicated change in state of a random spin site or pair. 



JT cLLIl V ell IcHUlt! 


Jr\. VJ VJL , 


I n QTl (V 


z> 111 oLdLc 


KVnT'DCCl T 1 


cCr\ 
* JJ 




s* 

I 






-^H-1,+1 




© 


© 


1 sr^N -i -I 

7v 2^i=i ^-s t =-\-i- L s t+5t — i-i 






e 


© 


1 1 1 

]V L>i=l 1 s*=-l 1 s *+' 5t =-l 


-^+1,-1 


X(l) 


© 


© 


1 1 1 

L % 




X(-l) 


e 


© 


1 v^N -1 -1 

N ^>i=l L 8 t i =-l 1 s \ +5t =+l 


1+1+1,+1+1 




© - © 


© - © 


^<i,j> 1 s|=l 1 s*=l 1 s *+«=l 1 s *+«=l 


1^-1+1,-1+1 or 1+1-1,+1-1 




© - © 


© - © 


^ E<i,j> lsj=-lls*. = ll s *+"=_il s *+' 5t = i 


1^-1-1,-1-1 




© - © 


© - © 


i/V E<i,j> l S f=-ll S *.=-ll s *+ 5 *=-ll s *+ 5 *=-l 


l+i+i,+i-i or l+i+i ,-i+i 


Yi(l) 


© - © 


© - © 


JN ^<i,j> 1 s*=l 1 s*.=l 1 s *+«=l 1 s *+'5*=-l 


y-1-1,+1-1 or 1^-1-1,-1+1 


n(-l) 


© - © 


© - © 


^<m> l s ?=-il s *.=-il s *+<»=-il s *+<»=+i 


l+i-i,+i+i or F_ 1+1)+1+1 




© - © 


© - © 


I/V E<i,j> l s *=-il s *=il s *+« = il s *+**=i 


l+i-i,-i-i or y_i+i ; _i_i 




© - © 


© - © 


E<i,j> l^=-il s *.=il s *+« = _il s *+«=_i 



AW(ATp) = ft(* s t+5 *) - H{Xl) 

= J2 2JzN[Yid) ~ W)] + 2 I 2 HN[X(1) - X(-l)} (3.7) 

i=±l 

Path Probability Function 

The probability of the path taken by the path variables, X p from initial state, X\ to final 
state, Xl +St , is given by the path probability function (PPF), analogous to the free energy 
in the Cluster Variation Method. The method constructs an approximation to the path 
probability function in terms of the path variables, in which the energy change is exact, 
but the entropy of change in approximated. Maximizing this variational formulation 
leads to the most probable path of change given the initial state. 

The path probability function, V, depends on three factors, namely, 

• The probability of occurrence of spin flip at a spin site in the time St, which is 
given by 

v x = n (est) Xi ^(i - est)** (3.8) 

i=±l 
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where 6 is the spin-flip rate. 

The likelihood of the change in state given by path variables, X p , which depends 



on the change in the hamiltonian, AT-C(X p ), in (3.7) given by 



P2 = eXp (-^^) 



where ks is the Boltzmann factor and T is the temperature. 



(3.9) 



The third factor is computed by observing the geometric relations present in the 
lattice. It is equivalent to the entropy in the cluster variation method, which in 
this case, is approximated in terms of the path variables. 



' p 
point 


z/2 


1 point 


z/2-l 


P ■ 
L 1 pair . 




M 





where 



(3.10) 



^point = UKNx^y.} 



pair 



n i( ny h4 



The final form of the path probability function [T5] is given as: 
1 



N 



U (i,j),(k,l) 



(3.11) 



+ £ [X it -t \n(98t) + X iti ln(l - 68t) - zK{Y x (i) - Y 2 (i)) - LiX{i)} 



i=±i 



where K = -r^f, L = and C(x) = x\nx — x. The most probable path of evolution v 



is obtained by differentiating V in (3.11) w.r.t. the independent path variables Y 8 (i) 



We note that the third factor in (3.10) is computed using counting arguments, and 
thus, is heavily dependent on the lattice structure. Further, larger graphs may have local 
variations in the state probabilities. We extend the ideas of path probability method 
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using the factor graph formalism to a general setting which addresses these issues. We 



3.2 Extension to graphical models 

We consider a system of N discrete- valued random processes X(£) = {X (t), . . . , X N _i(t)} 
having the Markov property, i.e., for times < t\ < . . . < t m _i < t m , we have 
P(x* m |x tm - 1 , . . . ,x 41 ) = P(x' m |x* m " 1 ), where x* stands for the system having state 
{xo, . . . , xn-i} at time t and P(x*) represents the probability of the system having state 
x* at time t, i.e., P(X(t) = x*). 

Given a conditional probability distribution j5(x* +<5 *|x*) which indicates the proba- 
bility that the system has state x' +<5 ' at time (t + 5t) and had state x* at time t, the 
conditional probability distribution can be factorized into localized interactions, denoted 
using the factor graph notation: 



P(x t+ V) = ^II Za(x' + Mx' a ) = _L- exp{-P(x^|x t )} (3-12) 



where P(x i+<5i |x') = — X^aeA l n /a( x * +<5i a| x *a) and the conditional partition function is 
Z(x t ) = y^ x t YiaeA /a( x<+5< a| x 'a)- Given an initial trial probability distribution {6(x*)} 
at time t — 0, we wish to find {6(x*)} Vt > 0. 

We use PPM to get a two-step iterative solution for finding {6(x')} Vt > 0. The first 
step is to compute an approximate joint distribution {6(x*' 5 *)} where x. t,St denotes the 
joint state (x* +<5 *,x*) which means that the system has state x* at time t and x* +<5 * at 
time (t + 8t), subject to the constraint 



where the criterion is minimization of the Kullback-Liebler divergence between {6(x* ,<5 *)} 



revisit the ising spin system in [A] in the context of the extended setting. 




(3.13) 
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and the probability distribution {^(x*' 5 *)} obtained by the relation 

p(x*- 5 ') =p(x* +5 *|x>(x*). 

The motivation is to express this minimization in terms of the beliefs {6 Q (x*' 5t a )} over 
the joint state {x t,St a ) = (x* +<5< Q , x* Q ) for a valid region graph {a : a E 1Z\. The beliefs 
{&Q,(x* ,<5t Q )} correspond to the path variables in the original PPM formulation. 

We note that the KL divergence between true joint probability distribution {p(x i '' 5 *)} 
and {6(x' ,<5t )} is given as 

D({b(^)}\\{p(^)}) = J D({6(x^)}||{p(x^)}) + J D({6(x t )}||{p(x*)}) (3.14) 

> D({b(^ 5t )}\\{p(^ 5t )}) (3-15) 



where p(x 4 ) is the true probability distribution at time t and p(x*' 5 *) = p{x t )p{x t+5t \x t ) . 

The second step computes the probability distribution {6(x i+<5 *)} according to the 
relation Y^ x t 6(x*' 5t ) = 6(x t+<5t ). Minimization of Kullback-Liebler divergence in (3.15) 
leads to the formulation 



min ^({6}) = H*{{b}) ~ S P ({b}) + S°({b}) (3-16) 

{fe(x*. ,5i )} 

where ^ rp ({6(x*' (St )}) is the logarithmic form of the path probability function (PPF) anal- 
ogous to the variational free energy T in CVM, the path conditional energy H p ({b}), 
path entropy S p ({b}), and the original state entropy S°({b}) are given by 

H p ({b}) = J2 b (^ t,St )H(^ t+St \^) (3.17) 

x t,<5i 

S p ({b}) = -^6(x*' <5 *)ln6(x*> 5 *) (3.18) 

x t,i5t 

S°({b}) = -^6(x f ' <s *)ln6(x t ) (3.19) 

x t,5t 

Analogous to the CVM case, the PPF T v can be approximated for a valid set of maximal 
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clusters 1Z with associated counting numbers {c a \ a G 71} as 

^({b a {^ 5t a ),a G 7Z}) = E c a J^({b a (^ 6t a )}) (3.20) 

where x 4,5 ^ denotes the joint state (x* +<5 * a , x* a ) for any cluster a, and the cluster path 
probability function for cluster a having associated path variables {& a (x' ,,5 * Q ,)} are given 
by 

Kiiba}) = H?({b a }) - S?({b a }) + S° a ({b a }) (3.21) 

and the region average path energy H' l ^({b a }), the region average path entropy S^({b a }), 
the region average state entropy S^({b a }) and the conditional region energy, H a {'K t+5t a \x t a 
are given by 



H p a ({b«}) = 


v t,8t 


(3.22) 


H a (jK a |x a ) 


-Eln/a(x' + Mx' a ) 


(3.23) 


S p a ({b a }) = 


- E b a (^ 5t a )lnb a (^ 5t a ) 

v t,St 


(3.24) 


S° a ({b a }) = 


A a 

- E &a(x t ' 5 * a )ln6 a (x* Q ) 

-«-£,5t 


(3.25) 



PPM reduces to minimization of the variational PPF (3.20) w.r.t. the path variables 
{6(x' ,<5 * Q )| x* ,<5 * Q , a G 7Z} subject to the constraints 

E &a(x*' w a ) = &«(x' a ) Vae?l (3.26) 

v £+ft 

E 6a(x*' 5 * a ) = V[3CaeH (3.27) 

A a\/3 

The choice of the trial conditional probability distribution p(x t+(5< |x*) is constrained 



as in (3.12). This is a necessary condition for the decomposition of the path conditional 
energy H p into cluster path energies {H^\a G 1Z} for a valid region graph to be accurate. 

The approximate probability distribution 6(x* +<5 *) is related to the cluster probability 
distributions {b a (x. t+5t a ), a G TZ} as b(x. t+5t ) oc Y[ a en b a (x t+St a ) Ca . 
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We present an iterative message passing algorithm to obtain the evolution of beliefs 
in the next section. 



3.3 DynBP Algorithm 



This section presents the iterative update algorithm, DynBF " which solves the opti 



mization problem having cost function (3.20) subject to the constraints (3.26) and (3.27) 



iteratively. We get the Lagrangian £ as 



C = E c * E M x *'a){ln& Q (x t ' 5t a )+H a ( X *><\)-ln6 Q (x' a )} 



+ E MO} 



+ E E a^x^h&^V) 



E &«(**'*«)} 



/3 



1 a\P 



(3.28) 



where A Q (x^) is the lagrangian coefficient corresponding to constraint (3.26) for each 
state x* a of region a, and A l g_ >Q! (x* ,<s * / g) is the lagrangian coefficients corresponding to 
constraints (3.27) for each joint state x*' 5 *^ of edge (a, 0) respectively. We denote 



m/ 3 -,a(^ t ' 5t p) = exp{A / 3^ a (x*' d * /3 )} and m a ^ 7 (x 



t,st 



exp{A a _^ 7 (x* ,<5 * Q )} as the mes- 



sages corresponding to the child region (3 and parent region 7 of region a respectively 
for each joint state x 4,5 ^, and m^x^) = exp{A Q (x* a )} as the message corresponding to 
the past state x* a for cluster a. 

Differentiating C w.r.t. b a (y^' st a ) and setting it to zero; and using the constraints in 



(3.26) and (3.27) respectively, we get an iterative update scheme for m l3 ^ a (-x t,St @) and 
w a (x*Q,) and 6 Q (x*' 5 ' a ) as: constraints in (3.26) and (3.27) respectively as: 



mi m) (x ( ( 



rn 



CO 



(X*a){ 



6 a (x' a ) 



— y 



rn 



a(x^){ 



ExM^V^) 



(3.29) 
(3.30) 



derivation of the DynBP algorithm is given in the appendix. 

2 We call the algorithm DynBP indicating BP over time for dynamic models. 
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t,St N _ Tlgga ./a(x' + *' |x' a ) f n^Ca m^X*^) ^ l/c a 

a 1 aJ ~ t \-i/ c s/ Irr ttft ^ J 1 J 

mi, (x'q) i/Ca X e ]IaC7 m "~^7 ( X a) 

where z denotes the iteration number. 

The DynBP algorithm passes messages from the child to the parent for each child 
joint state. The stopping criterion we chose is based on the maximum change in region 
beliefs compared with the previous iteration beliefs. DynBP allows handling of complex 
models, while maintaining temporal causality. 

The final algorithm is given in [T] 

Algorithm 1 DynBP 



input: p(x i+<5i |x*) on factor graph G = (V,E) as in (3.12); Maximal set of clusters 

{(a,c a ), a E TZ}; initial probability distribution {6(x')} at t = 

output: partial beliefs {6 a (x* Q )}, a E TZ at times t\ <t% < • • • < t max 

init: t <- 0; m a (x* Q ) = 1 Vet G TZ; m /3 _ Q (x t ' <5 * /3 ) = 1 V/3 C a E TZ 

repeat 

for all a E TZ do 
for all x* ,5 * a do 

Update b a (x t ' St a ) as in (B.6) 



for all 7 G TZ such that a C 7 do 



Update m a ^ 7 (x* ,<5 ' a ) as in (3.30) 



end for 
end for 
for all x* a do 

Update m Q (x* Q ) as in (3.29) 
end for 
end for 

if Stopping Criterion then 

Set b Q (x t+5t a ) <- E X * Q 6a(x*' 5t a ) 
Update £ <— t + 5t 
Reset m /3 ^ Q (x*' 5 ^) = 1 V/? C a E TZ 
end if 
until t > £„,„,. 
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Summary 

In this chapter, we have reviewed the Path Probability Method as it appears in statistical 
mechanics. We have extended the Path Probability Method to approximate inference 
over general graphical models. This has resulted in the DynBP algorithm. 

In the following chapter, we compare the DynBP algorithm with an implementation 
of GBP over the "space-time" factor graph. We also perform experiments to verify the 
accuracy and efficiency of the algorithm. 



Chapter 4 
Algorithm Analysis 



We analyse the DynBP algorithm in this chapter and compare it against existing ap- 



proaches. Section 4.1 explores the relation between DynBP and GBP algorithm. Sec- 



tion 4.2 presents the experiments for accuracy and efficiency of the algorithm. 



4.1 Comparison of DynBP with GBP 

DynBP is closely related to GBP. Standard GBP is a static algorithm, i.e., there is no 
concept of temporal evolution. In this chapter, we focus on the extension of GBP to 
handle temporal evolution of beliefs and its relation with DynBP. 

However, in cases where messages have to be passed only in the forward direction in 
time, DynBP is better suited to GBP. To illustrate, we consider an augmented "space- 



time" factor graph (3.12) with additional factors {6 a (x* a )|a e 7Z} given by 



P(X*+ 5 ', X*) OC n Ki^a) R /.(x*+« a |x* a ) X a C X (4.1) 

where P(x*) oc fIaeA^a( xt a) corresponds to the initial probability distribution at time t. 
Further, for any cluster a, the initial cluster probability distribution {P(x* Q )} is given 



by P(x t a ) oc n a ea^a( x *a)- The GBP algorithm yields the original cost function (3.20) 



subject to the original compatibility constraints (3.27) and a modified normalization 
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constraint 



£ Mx*-**) = 1 



(4.2) 



instead of the original normalization constraint (3.26). Proceeding as in the previous 
section, We obtain the update equations as: 



m 



(i+l) 



ni 



(0 



f 1 1 c Q 

^ExM t „^(x^ ^ 



^a( x a) riaSa /a( x 



; x e 



ngCa^(^) jl/c 
IlaC7 m "- > 7 ( x<,<5 *o) 



and mi^^x*' 5 *^) same as in (3.30). The GBP algorithm does not enforce (3.26), and 



(4.3) 
(4.4) 



hence, the prior distribution b ( ^ BP {x t a ) given by: 



GBP i t,St 
a l x c 



(4.5) 



may differ from 6 a (x t a ), while this equality is strictly enforced by the DynBP algorithm 
at the cost of additional messages m Q ,(x* Q ), one for each state x* a , instead of a single 
message m a for each cluster a. A particular advantage of DynBP over GBP would be 
when strict temporal coherence is needed, i.e., the messages from future should not affect 
beliefs at previous nodes. DynBP provides a natural way of implementing this scheme. 
We compare an implementation of DynBP with standard GBP run over the modi- 



fied space-time graph and show the results in figure The results indicate that the 
algorithms are comparable with the additional message requirement in case of the GBP 
algorithm. 



4.2 Ising Spin Experiment 

We consider the square lattice Ising model, which has N binary variables arranged in 
an L x L square lattice, and each variable node is connected to its nearest neighbors by 
a pairwise factor of the form f a (xi,Xj) = exp{JijXiXj} and has a "local magnetic field" 
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Figure 4.1: Ratio of the variational free energy T = —\nZ reported by DynBP and 
standard GBP implementation for N=200 trials on an Ising grid. 
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of the form, fi(xi) = exp{/ij2j}, where Xi G {±1} and the parameters Jy, hi are chosen 
from Gaussian probability distributions with mean and variance 0.1. We propose to 
solve the inference problem on the graph by the DynBP algorithm. 

We compare the DynBP marginal probabilities at each node w.r.t. the true marginal 
probabilities and find that the DynBP tracks the true marginal probabilities with a high 
degree of accuracy, usually, with nearly 50% cases having relative absolute error within 
10%, as seen from Figure 4.3| 



Figure 4.2 shows the evolution of beliefs at a random node at different values of 
85t, compared against the results obtained by loopy BP and the true evolution of belief 
obtained by exhaustive simulation. We note that at higher 8St, the spin is highly disposed 
to flip sign, and hence we see the oscillatory behaviour in node marginal beliefs initially. 



Figure 4.4 indicates that the belief compatibility between the parent and child regions 
is obtained within some iterations. It can be seen that the beliefs between child and 
parent regions reach compatibility within a few iterations, except for some residual error 
in some nodes. 
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Figure 4.2: The figure shows the evolution of beliefs at values of 85t being (a) 0.1, (b) 0.5 
and (c) 0.9 respectively. We note the oscillatory behaviour of the node beliefs reported 
by the algorithm due to high spin flip rate, (d) Evolution of belief at various spin flip 
rates at the same node. Lower values of 85t discourages rapid change of state. Note that 
at 65t = 0.5 the beliefs reach equilibrium in one iteration as the time factor is same for 
all states. 
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Figure 4.3: Histogram of ratio of deviation between reported beliefs and the true marginal 
probability against the true marginal probability at the node, at node (0,0) for 

various seeds averaged over the simulation period for a 3 x 4 Ising model at various field 
strengths: (a) h = 0.1, j = 0.5 (b) h = 1.0, j = 0.1, (c) h = 0.1, j = 0.1. 
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Figure 4.4: Maximum difference between state belief for child region and correspond- 
ing marginalization in parent region; and maximum change in region beliefs with each 
iteration. 
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Summary 

In this chapter, we have compared the DynBP algorithm with an equivalent GBP im- 
plementation. We have also verified the accuracy of the algorithm. The next chapter 
presents the application of the algorithm to some computer vision problems which exhibit 
spatio-temporal coherence structure that can be exploited using the algorithm. 



Chapter 5 
Applications 



This chapter presents the experiments and results. Section 5.1 presents the experiments 



on moving object detection. Section 5.2 presents the dropped frame reconstruction 



experiments. Section 5.3 presents the experiments on video denoising. 



5.1 Moving Object Detection 

Detection of motion in video has been a key problem in computer vision. Frame difference 
techniques have been widely used to detect motion based on change in pixel values [HI 
dU EH]- These techniques detect motion easily but often do detect the partial outline 
of the object under motion rather than the complete object. Other approaches include 
motion history image (MHI) [3] which uses a temporal template to detect human motion, 
optical flow based pES] , background model based history maps [T5] . 

Yin and Collins [52J first used the idea of spatio-temporal MRFs for detecting motion 
in video. In this thesis, we extend their work to a formal setting under the DynBP 
framework. 

Under the model, each pixel (m, n) at time instant k has a hidden state node s(m, n, k) 
which represents the likelihood that a pixel contains object motion, and a corresponding 
data node d(m,n, k), which represents the binary motion detection result computed by 
inter-frame differencing at time k. The network consists of nearest neighbor nodes in 
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space and time, i.e., a node s(m,n,k) has four spatial neighbors s{m ± l,n, k) and 
s(m, n±l,k) and two temporal neighbors s(m, n, k ± 1). The joint probability over the 
3D volume is given as P(si, • • • , s N , di, ■ ■ ■ , c?Ar) = Il^j ipij{si, Sj) T[ k <f) k {sk, d k ) where Si 
and iij represent the state node and the data node separately. <p(sj,dj) represents the 
measurement relation between observation dj and hidden state node Sj. If dj equals zero, 
no motion is detected at the pixel by inter-frame differencing. The measurement relation 
(p(sj,dj); Vsj G {1, • • • , C} is given by 4>j{sj = p, dj) = ^ if dj = else (fij(sj = p, dj) = 
5(p = C) where C is the number of quantization bins for the belief at state node. The 
state transition function ipij(si,Sj) is given as ipij(si = p, Sj = q; 9) = 9 if p = q else 
= P, Sj = q; 9) = e where 9 and e are related as < 9 < 1, e = (1 — 9)/(C — 1) and 
9 3> e. We define a closely related formulation of the conditional probability function 
]3(s* +1 |s*) at any time instant t as 



fij{si,Sj) = ipij(si, Sj] 9 S ) where state nodes and Sj are spatial neighbors. The time 
compatibility is captured in /j(s- +1 |s-, <i- +1 ), 



where we choose /• = max(0, s\ — 1), which acts as a decay function in absence of binary 
motion detected by inter-frame differencing(<i- +1 = 0). We construct a valid region graph 
using Bethe approximation, i.e., large regions consisting of spatially neighboring (sj, Sj) 
and small regions Sj, and use MAP rule on each small region as s 4 MAP = arg max Si bi(si) at 
each time instant. This is conceptually similar to forward spatial-temporal BP reported 
in the original paper. Figure 3 shows the results of the algorithm applied to a 80 x 50 x 240 
video. 



(m) 



(5.1) 



where /ij(s* +1 , s* +1 ) corresponds to the spatial compatibility function and is given by 




(5.2) 
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Figure 5.1: Moving Object Detection Experiment: Random Patch: (a) There is a random 
5x5 patch moving on 50 x 50 background. The three vertical frames shows the image, 
inter- frame difference and the DynBP result respectively for the random patch. [52] report 
that the camouflaged object cannot be adequately detected by histogram based detector. 
Median and Gaussian filters do not localize the object position well. We observe that our 
result are similar to the forward spatio-temporal BP in [52]. Real Video: (b) The original 
frame and (c) DynBP result for frames 20, 90 & 160 for 6 S = 0.99, 9 t = 0.6, C = 2, where 
the input frame is quantized to 8 bins to reduce jitter. 
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5.2 Dropped frame reconstruction 

Video completion and dropped frame reconstruction have gained increasing importance 
due to recent growth in online videos and mobile devices. A critical parameter is the 
duration of video frames to be filled which differs based on the task. Several approaches 
using incremental completion have been investigated. Wexler etal. jl5] use patch based 
techniques to fill in the missing pixels. Jia etal. [17] use tracking and fragment merging 
to fill in the missing patches. Cheung etal. [5] uses probabilistic representations of the 
video (epitomes) to fill in the missing patches. 

A key problem with applying inference algorithms to video completion is the state 
space size which is exponential in the number of values that can take. For any region a, 
we use an approximate conditional probability which is dependent only on the number 
of variables |a|, present in region a, i.e., p^ +5 'l*(x^ +5 '|x^) = p^ 5 *'*(x^ +5 ' |x^). 

A reduced search space is obtained by considering the high probability next states 
given the current state for each region. The search space is then augmented corresponding 
to the extra states which are present either in the child or parent of current region, thus 
obtaining a closed search space on which the algorithm can be run. 

We use the following preprocessing algorithm for getting a valid search space for the 
PPM algorithm: 

1. PruneCandidateList(): This step selects only the candidates with high probability 
given the current state 

2. AugmentParent() : Add the missing states in the child region corresponding to 
parent region state 

3. AugmentChild() : Add the most likely states in the parent region such that all 
states present in the child region have equivalent representation in parent region 

4. We use approximate nearest neighbour search to find the best matching candidates. 

In our experiment, we consider a 300 x 100 x 240 video sequence which is quantized 
to 8 levels, and 5x5 regions. We find that each dropped frame reconstruction takes 
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Figure 5.2: This experiment reconstructs the dropped frames when receiving streaming 
video over the internet. The frame-dropping is modeled as a Bernoulli process similar 
to [5]. The leftmost and rightmost frames are the true normalized frames 150 and 157 
respectively, while the predicted frames 153 and 155 are shown in the middle. 



3 — 4 seconds. Figure 5.2 shows the application of the PPM algorithm to te problem of 



dropped frame reconstruction for streaming video broadcast. 



5.3 Video denoising 

Video denoising has become an increasingly important video processing task with the 
rapid growth of multimedia technology, since distortion of a video is inevitable during 
its acquisition, processing, storage and transmission [23J. 

Several video noise reduction techniques have been investigated which use spatio- 
temporal filters in the pixel domain. Some examples include the adaptive weighted 
averaging filter [32J, adaptive recursive least square filter [22] and motion compensated 
Kalman filter |47j . Alternative methods have explored filtering in the transform domain 
such as wavelet transforms [371 |53j G3]. There exists a high correlation among neigh- 
bouring frames of a video, since the motion among such frames is small. This makes it 
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well suited for application of DynBP algorithm. 

Video denoising [Sj is posed as an inference problem, where, some of the pixel values 
are missing and the missing pixel values are filled using the MAP estimate, i.e., those 
values which have maximum probability given the evidence posed by the known pixels. 
We consider a 352 x 240 video in which the red, green and blue components of the pixels 
and missing with 50% probability. A high variance noise a 2 is added to the missing 



pixels. The video data which quantized to 8 levels. Figure 5J3 shows the results of 
applying the algorithm to the problem of video denoising. 




Figure 5.3: The 352 x 240 video has its red,green and blue pixels missing independently 
with 50% probability. A high variance noise is added to the missing pixels. The top 
frame shows the noisy video, and the bottom frame shows the result of denoising. 



Chapter 6 



Conclusions and Future Work 

We have investigated the Path probability method present in statistical mechanics. We 
have extended it to handle approximate inference over time in graphical models. We 
have verified the accuracy of the algorithm by comparison against existing approaches. 
A special formulation based on the Generalized Belief Propagation algorithm has been 
shown to be equivalent to the Path Probability Method. 

We have demonstrated the applicability of the algorithm to problems in computer 
vision which have special temporal evolution characteristics. 

6.1 Future Work 

A key point of interest is the extension of the DynBP algorithm to handle continuous time 
evolution. The update equations take the form of stochastic differential equations, and 
have to be treated differently. Tractable solutions may be obtained for some conditional 
evolution distributions. 

Another direction of research is application of DynBP to the problem of multiple 
object tracking with different speeds. 
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Appendix A 

Ising System Reformulation 



We revisit the homogenous ferromagnetic Ising system in Section |3.1| and show how it 
can be recast in the extended setting. We consider a region graph based on the Bethe- 
Pierls approximation, consisting of TZ = TZ S U TZl-, where TZs and TZl are the small and 
large regions. There are N small regions, each consisting of a single variable node. There 
are Nz/2 large regions, each consisting of a single factor node and all the variable nodes 
neighbouring the factor node. The counting numbers for each region are given by 

c a = i- c p 

f3£S(R) 

where S(R) is the set of regions that are super-regions of R. Then, 



(A.1) 



{ 1 Va G ft L 

c a = [ 

1 — z Va G TZs 

The conditional region energy, if Q ,(x t+<5 * Q |x* Q ), will then be given by: 



where N a is the number of variable nodes in the cluster, i.e., N a — 1 if a G TZl else 
N a = 2 if a G TZs] N[ is the number of variable nodes that have flipped between times 
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t and t + St; and 7i a (x a ) is the cluster hamiltonian given by: 

r -^oHEi-(±ni 1{Sa=i} Vx a G ft 5 

W = { { 1 (A.3) 

Ei, j = { ±iy{-^H(i + j) - Jtj} 1 ^^ Vx a G ft L 

The key simplification to obtain the original Kikuchi PPF from the general formula- 
tion is to assume that the clusters are indistinguishable. This implies that all clusters of 
same dimensions have the same initial probability distribution as well as the same path 
variables clusters as well as the joint probability distributions. In other terms, 

6 Q (x* Q = i) = Xi Va G TZs; (A.4) 
b a (x f a = {i,j}) = Vij \/aen L (A.5) 



and, 



b a (x t+6t a = j,x t a = i) = X id Vae1l s (A.6) 
b a (x t+ * t a = {k,l},x t a = {i,j}) = Y im Vaen L (A.7) 

where X s = {xi,yij} and X p = {Xij, Yij,ki} are the state and path variables in the 
Kikuchi formulation respectively and i,j,k,l G {±1}- The state entropy S°({b}) is 
given by: 

S°({b}) = -[£ E b a (x t > st a )\nb a (x t a )+ E (1-z) E 6 a (xM* a )ln6 a (x^}8) 

"[ E h &«(**«) + E (l-^)E 6 «(x*a)lB6 a (x*a)] (A.9) 

x' a a£TZs x* a 

= -[— E yylnj/y + JV(l-zr) E ^ ln ^] ( A - 10 ) 
»je{±i} ie{±i} 



In the above derivation, we have implicitly used (3.26), which reduces ^"({fe}) to a 
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constant. The path entropy S' p ({6})is given by: 

S p ({b}) = -[E E b a (x t ' st a )lnb a (x t > 5t a )+ J2 (1-*) E 6a(^a)ln6 a (3^U) 

= -[-=- E Y ijM \nY ijM + N(l-z) E •V,,ln.Y,. ; ; (A. 12) 

y,f=,ie{±i} ie{±i} 

The region graph formulation and the associated counting numbers have the single count- 
ing property such that each factor and variable is counted exactly once. This property 
and the indistinguishable property result in the path energy as : 



H p ({b}) = E c « E & a (x^ Q )lni7 a (x t+ « a |x* a ) (A.13) 

= -H(x. t+5t ) + H(x t ) + N E {^,-ihi(^t)+X M ln(l-^)XA.14) 

i={±i} 

= -ft(x'+ 5 ') + W(x*) + A/ ln(05t) + ( A - A/) ln(l - 05t) (A. 15) 



The term {— Ti{x t+5t ) + 7Y(x*)} denotes the change in energy between current and next 
state and is expressed in term of path variables as: 

l{-^(x' +5t )+H(x')}= Z -J E Y idM (kl-ij)+ t t H'EX i jti-i) (A.16) 

i,j,k,l i,j 

We can write the variational free energy as: 

T = S p ({b})-S°({b}) + H?({b}) (A.17) 

= E i x i,-i Host) + x i;i in(i - est)} - ^{H(x t+5t ) + w(x*)} 
i=±i ^ 

Nz 

-— E Y ijM lnY ijM - N(l - z) E X id lnX id 
id,k,ie{±i} *6{±i} 

Az 

+[~^ E ^ln^ + A(l-2) E ^ ln ^] ( A - 18 ) 

i,je{±l} ie{±i} 



which is minimized subject to the constraints (3.26) and (3.27). The logarithmic form 
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of the original path probability function V is given by 



L \nV(X p ) = ^{Z i) _ i l n (^t)+X M ln(l-^)}-^{W(x*+ 5t )+W(x*)} 

H Y ijM ln Y ij,U ~ ( 1 - Z) X i,j 111 X hj + COI1St ( A - 19 ) 



N 



2 . 



i,j,k,le{±l} 



ie{±i\ 



by using the Stirling approximation and neglecting constant terms. The above term 
is minimized w.r.t. the path variables, subject to the normalization and consistency 
constraints. 



Comparing (A. 18) with (A. 19), and observing that S°({b}) reduces to a constant 



value due to (3.26), we find that the PPF formulation is equivalent to the DynBP formu- 



lation using Bethe-Pierls approximation and the trial conditional probability distribution 
given by: 



p(x* +5t |x*) oc exp{-H(x 



t+5t\ 



W(x')} x (05t) Nf (1 - 65t) 



\N-N* 



(A.20) 



This completes the proof. 



Appendix B 



Derivation for DynBP 

In this section, we present the derivation for message update equations of the DynBP 
algorithm. The DynBP optimization problem is: 

mm E*enc a { £ x *,« a b a {^ St a ){\nb a (^ 5t a ) - E a£ a In / a (x*+ 5 * a |x* a ) - ln6 Q (x t a Jp}l) 

6a(x*.' , * C( ) 

si. E x *+« a &a(x*>** a ) = 6 a (x' a ) Va (B.2) 

Ex«.« aXfl = M x *'%) V/3 C a (B.3) 

Writing the lagrangian, we get 

C = E c " E b a (x t ' s \){\nb a (x t ' s \) + H a (x t ' s \)-\nb a (x t a )} 

+ EE^(»(x f a )-EMxf)} 

+ E E A^ Q (x^^){^(x^V) - E &«(**'*«)} (B.4) 

Differentiating £ w.r.t 6 Q (x*' <5 ' Q ,) and setting it to zero, we get: 

c a {ln6 Q (x t >«* a ) + 1 + H a ^ t+St a \^ a ) - ln6 a (x* a )} 

-A a (x* Q ) + J2 Aa^ 7 (x*' 5 * Q ) - £ A /3 ^ Q (x t >%) = 
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which results in the update equation for 6 a ( x a) of the form 



ln& a (x*»** a ) 



-1 - H a (^ t+6t a \^ a ) + \nb a (^ a ) + -A Q (x* a ) 



a (3Ca 



(B.5) 



or the equivalent, 



baj^a) Ua &a / Q (x t+ ^ a |x f Q ) , U P Ca m^afe'^) \ 1/c 

m a (x* a )-Vcc» x e in 



, , , ( -v-t,<5t ^ J 
aC7 o— >7V a/ 



(B.6) 



where m^afx*'^) and the messages corresponding to the child region 

(3 and parent region 7 of region a respectively, given by: 



TTZq, ( 



expfA^fx 4 ^)} 



m a ^ 1 (x t ' 5t a ) = exp{A 



fx 4 ' 5 * 



)} 



(B.7) 
(B.8) 



for each joint state x*' a . The message m Q (x* Q ) corresponding to the past state x* a is 
given by: 



«\*( x *a) = exp{A Q (x* Q )} 



(B.9) 



In order to get the iterative update equation for m a (x t a ), we observe from (B.6) that at 
the z-th iteration, 



E 



„, oc m«(x* a ) 1/cD 



A+St 



(B.10) 



Comparing (B.10) with (3.26), we get: 



t+St 



m 



«(x* a )Vc 



m 



x 4 a ) 



6„(x* r 



(B.H) 
(B.12) 
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Similarly for m J g_ >a (x*' ^g), we observe from (B.6) that at the z-th iteration 



E ^0 



x t,st 



<5f 



(B.13) 

(B.14) 
(B.15) 



Comparing (B.15) with (3.27), we get 



1.0 



(x 



m^ Q (x*»« /9 ) 1 /«-+i/^ 



m 







a (x<^){ 



&«(x'^) 



a\/3 



6^(x*.« a ) 



Thus, we get the updated messages using (3.29) and (3.30), which are then used in (B.6) 



to get the next iteration values for 6^ +1 ^(x* ,,5 * Q ). This completes the proof for DynBP. 
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